Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 11 de 11
Filtrar
Mais filtros










Base de dados
Intervalo de ano de publicação
1.
Nature ; 2024 May 08.
Artigo em Inglês | MEDLINE | ID: mdl-38718835

RESUMO

The introduction of AlphaFold 21 has spurred a revolution in modelling the structure of proteins and their interactions, enabling a huge range of applications in protein modelling and design2-6. In this paper, we describe our AlphaFold 3 model with a substantially updated diffusion-based architecture, which is capable of joint structure prediction of complexes including proteins, nucleic acids, small molecules, ions, and modified residues. The new AlphaFold model demonstrates significantly improved accuracy over many previous specialised tools: far greater accuracy on protein-ligand interactions than state of the art docking tools, much higher accuracy on protein-nucleic acid interactions than nucleic-acid-specific predictors, and significantly higher antibody-antigen prediction accuracy than AlphaFold-Multimer v2.37,8. Together these results show that high accuracy modelling across biomolecular space is possible within a single unified deep learning framework.

2.
ArXiv ; 2023 May 26.
Artigo em Inglês | MEDLINE | ID: mdl-37292483

RESUMO

Directed evolution of proteins has been the most effective method for protein engineering. However, a new paradigm is emerging, fusing the library generation and screening approaches of traditional directed evolution with computation through the training of machine learning models on protein sequence fitness data. This chapter highlights successful applications of machine learning to protein engineering and directed evolution, organized by the improvements that have been made with respect to each step of the directed evolution cycle. Additionally, we provide an outlook for the future based on the current direction of the field, namely in the development of calibrated models and in incorporating other modalities, such as protein structure.

3.
Nature ; 596(7873): 590-596, 2021 08.
Artigo em Inglês | MEDLINE | ID: mdl-34293799

RESUMO

Protein structures can provide invaluable information, both for reasoning about biological processes and for enabling interventions such as structure-based drug development or targeted mutagenesis. After decades of effort, 17% of the total residues in human protein sequences are covered by an experimentally determined structure1. Here we markedly expand the structural coverage of the proteome by applying the state-of-the-art machine learning method, AlphaFold2, at a scale that covers almost the entire human proteome (98.5% of human proteins). The resulting dataset covers 58% of residues with a confident prediction, of which a subset (36% of all residues) have very high confidence. We introduce several metrics developed by building on the AlphaFold model and use them to interpret the dataset, identifying strong multi-domain predictions as well as regions that are likely to be disordered. Finally, we provide some case studies to illustrate how high-quality predictions could be used to generate biological hypotheses. We are making our predictions freely available to the community and anticipate that routine large-scale and high-accuracy structure prediction will become an important tool that will allow new questions to be addressed from a structural perspective.


Assuntos
Biologia Computacional/normas , Aprendizado Profundo/normas , Modelos Moleculares , Conformação Proteica , Proteoma/química , Conjuntos de Dados como Assunto/normas , Diacilglicerol O-Aciltransferase/química , Glucose-6-Fosfatase/química , Humanos , Proteínas de Membrana/química , Dobramento de Proteína , Reprodutibilidade dos Testes
4.
Curr Opin Chem Biol ; 65: 18-27, 2021 12.
Artigo em Inglês | MEDLINE | ID: mdl-34051682

RESUMO

Protein engineering seeks to identify protein sequences with optimized properties. When guided by machine learning, protein sequence generation methods can draw on prior knowledge and experimental efforts to improve this process. In this review, we highlight recent applications of machine learning to generate protein sequences, focusing on the emerging field of deep generative methods.


Assuntos
Aprendizado de Máquina , Engenharia de Proteínas , Sequência de Aminoácidos
5.
Curr Opin Struct Biol ; 69: 11-18, 2021 08.
Artigo em Inglês | MEDLINE | ID: mdl-33647531

RESUMO

Machine learning (ML) can expedite directed evolution by allowing researchers to move expensive experimental screens in silico. Gathering sequence-function data for training ML models, however, can still be costly. In contrast, raw protein sequence data is widely available. Recent advances in ML approaches use protein sequences to augment limited sequence-function data for directed evolution. We highlight contributions in a growing effort to use sequences to reduce or eliminate the amount of sequence-function data needed for effective in silico screening. We also highlight approaches that use ML models trained on sequences to generate new functional sequence diversity, focusing on strategies that use these generative models to efficiently explore vast regions of protein space.


Assuntos
Aprendizado de Máquina , Proteínas , Sequência de Aminoácidos , Simulação por Computador , Proteínas/genética
6.
ACS Synth Biol ; 9(8): 2154-2161, 2020 08 21.
Artigo em Inglês | MEDLINE | ID: mdl-32649182

RESUMO

Short (15-30 residue) chains of amino acids at the amino termini of expressed proteins known as signal peptides (SPs) specify secretion in living cells. We trained an attention-based neural network, the Transformer model, on data from all available organisms in Swiss-Prot to generate SP sequences. Experimental testing demonstrates that the model-generated SPs are functional: when appended to enzymes expressed in an industrial Bacillus subtilis strain, the SPs lead to secreted activity that is competitive with industrially used SPs. Additionally, the model-generated SPs are diverse in sequence, sharing as little as 58% sequence identity to the closest known native signal peptide and 73% ± 9% on average.


Assuntos
Aprendizado de Máquina , Sinais Direcionadores de Proteínas , Área Sob a Curva , Bacillus subtilis/metabolismo , Proteínas de Bactérias/metabolismo , Bases de Dados de Proteínas , Curva ROC
7.
Nat Methods ; 16(8): 687-694, 2019 08.
Artigo em Inglês | MEDLINE | ID: mdl-31308553

RESUMO

Protein engineering through machine-learning-guided directed evolution enables the optimization of protein functions. Machine-learning approaches predict how sequence maps to function in a data-driven manner without requiring a detailed model of the underlying physics or biological pathways. Such methods accelerate directed evolution by learning from the properties of characterized variants and using that information to select sequences that are likely to exhibit improved properties. Here we introduce the steps required to build machine-learning sequence-function models and to use those models to guide engineering, making recommendations at each stage. This review covers basic concepts relevant to the use of machine learning for protein engineering, as well as the current literature and applications of this engineering paradigm. We illustrate the process with two case studies. Finally, we look to future opportunities for machine learning to enable the discovery of unknown protein functions and uncover the relationship between protein sequence and function.


Assuntos
Algoritmos , Evolução Molecular Direcionada , Aprendizado de Máquina , Modelos Biológicos , Engenharia de Proteínas/métodos , Proteínas/metabolismo , Humanos , Proteínas/genética
8.
Proc Natl Acad Sci U S A ; 116(18): 8852-8858, 2019 04 30.
Artigo em Inglês | MEDLINE | ID: mdl-30979809

RESUMO

To reduce experimental effort associated with directed protein evolution and to explore the sequence space encoded by mutating multiple positions simultaneously, we incorporate machine learning into the directed evolution workflow. Combinatorial sequence space can be quite expensive to sample experimentally, but machine-learning models trained on tested variants provide a fast method for testing sequence space computationally. We validated this approach on a large published empirical fitness landscape for human GB1 binding protein, demonstrating that machine learning-guided directed evolution finds variants with higher fitness than those found by other directed evolution approaches. We then provide an example application in evolving an enzyme to produce each of the two possible product enantiomers (i.e., stereodivergence) of a new-to-nature carbene Si-H insertion reaction. The approach predicted libraries enriched in functional enzymes and fixed seven mutations in two rounds of evolution to identify variants for selective catalysis with 93% and 79% ee (enantiomeric excess). By greatly increasing throughput with in silico modeling, machine learning enhances the quality and diversity of sequence solutions for a protein engineering problem.


Assuntos
Técnicas de Química Combinatória/métodos , Evolução Molecular Direcionada , Aprendizado de Máquina , Oxigenases/genética , Rhodothermus/enzimologia , Bibliotecas de Moléculas Pequenas , Sequência de Aminoácidos , Humanos , Modelos Moleculares , Oxigenases/metabolismo , Conformação Proteica
9.
Oncol Lett ; 16(5): 6763-6769, 2018 Nov.
Artigo em Inglês | MEDLINE | ID: mdl-30405820

RESUMO

Sigma-1 receptor (sigma-1R), a 25-kDa integral membrane protein, is expressed at a high density in various tumor cell lines and its ligands mediate tumor cell proliferation. However, the effect of this receptor on proliferation and the associated intracellular molecules in tumors remains unclear. The present study aimed to investigate the effect of sigma-1R overexpression on MCF-7 cell proliferation and the associated intracellular molecules that serve a key role in this process. The sigma-1R proliferative function was examined by comparing the proliferation rates of a sigma-1R-overexpressing line, MCF-41 with a sigma-1R-defective line, MCF-7, in culture media with various serum concentrations. The results demonstrated that MCF-41 cells grew significantly faster compared with MCF-7 cells, indicating a proliferation-enhancing receptor function. This proliferation-enhancing effect was completely eliminated by adding a PKC inhibitor to the culture media for MCF-41 cells. To identify which PKC subtype affects the proliferative function of sigma-1R, five inhibitors of PKC subtypes or enzymes involved in the PKC signaling cascade were introduced to MCF-7 and MCF-41 cell culture media and their effects on cell proliferation were compared. It was revealed that only the classic PKC subtype inhibitor, GF109203×, significantly inhibited MCF-41 cell proliferation compared with the MCF-7 line. In conclusion, among PKC iso-enzymes only classic PKC subtype enzymes serve an important role in sigma-1R overexpression enhancing MCF-7 cell proliferation.

10.
11.
Bioinformatics ; 34(15): 2642-2648, 2018 08 01.
Artigo em Inglês | MEDLINE | ID: mdl-29584811

RESUMO

Motivation: Machine-learning models trained on protein sequences and their measured functions can infer biological properties of unseen sequences without requiring an understanding of the underlying physical or biological mechanisms. Such models enable the prediction and discovery of sequences with optimal properties. Machine-learning models generally require that their inputs be vectors, and the conversion from a protein sequence to a vector representation affects the model's ability to learn. We propose to learn embedded representations of protein sequences that take advantage of the vast quantity of unmeasured protein sequence data available. These embeddings are low-dimensional and can greatly simplify downstream modeling. Results: The predictive power of Gaussian process models trained using embeddings is comparable to those trained on existing representations, which suggests that embeddings enable accurate predictions despite having orders of magnitude fewer dimensions. Moreover, embeddings are simpler to obtain because they do not require alignments, structural data, or selection of informative amino-acid properties. Visualizing the embedding vectors shows meaningful relationships between the embedded proteins are captured. Availability and implementation: The embedding vectors and code to reproduce the results are available at https://github.com/fhalab/embeddings_reproduction/. Supplementary information: Supplementary data are available at Bioinformatics online.


Assuntos
Biologia Computacional/métodos , Aprendizado de Máquina , Modelos Biológicos , Proteínas/química , Análise de Sequência de Proteína/métodos , Software , Sequência de Aminoácidos , Bactérias/metabolismo , Eucariotos/metabolismo , Humanos , Proteínas/metabolismo , Proteínas/fisiologia
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA
...